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ABSTRACT 


Today’s Internet scarcely resembles the mythological image 
of it as a fundamentally democratic system. Instead, users — 


Inr response, rosuarchers EE engineers have e proposed. o over 
the = decade, auy SS. to re-democ e te 


] S ( ailec Ti isi paper we Spice 
a ae are e the goals of aem systems and what has caused 
them to run aground? 


1 INTRODUCTION 


Five years ago, Bruce Schneier noticed something curious 
about the state of the user-facing Internet [42]: 


Some of us have pledged our allegiance to 
Google: We have Gmail accounts, we use Google 
Calendar and Google Docs, and we have Android 
phones. Others have pledged allegiance to Ap- 
ple: We have Macintosh laptops, iPhones, and 
iPads; and we let iCloud automatically synchro- 
nize and back up everything... Some of us have 
pretty much abandoned e-mail altogether... for 
Facebook. 


giance, we Be disina benefits [42]: 


We ae to it eee of the con 


it when we can access our e- mail sangre from 
any computer. We like it when we can restore our 
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contact lists after we’ve lost our phones. We want 
our calendar entries to automatically appear on 
all of our devices. These cloud storage sites do a 
better job of backing up our photos and files than 
we would manage by ourselves... 


The de facto reality of the Internet of the 1990s and early 
2000s matched its de jure architecture: a federated network 
of many autonomous providers with little centralized control 
of services or infrastructure. Today’s Internet, while governed 
by many of the same protocols, scarcely resembles its past. 

None of this is news to networking researchers and engi- 
neers who, unsettled at the notion of becoming vassals to 
powerful companies, have designed and built numerous sys- 
tems that aim to upset this power balance and re-democratize 
the Internet. Over the past decade many such systems have 
been developed, and there has been increased interest in just 
the last few years.' While these diverse efforts do not have 
a unified objective, they have largely aimed to overcome the 
privacy, security, and reliability challenges of a feudal inter- 
net. Yet, to date, nearly all of these efforts have failed. In this 
paper, we seek to understand why. 

What barriers remain to overthrowing the current structure 
of the Internet? We begin by considering the benefits and 
drawbacks of today’s architecture. We then coalesce the ob- 
jectives of various projects to identify requisite properties and 
fundamental components of a re-democratized Internet. We 
also examine how existing efforts aim to satisfy these prop- 
erties and the mechanisms used to do so. Finally, we discuss 
what is missing both technically and otherwise, and suggest 
directions for future research. 


2 A FEUDAL INTERNET 
What are the common oe ramme ofa EE 


The discusion i in this space has become confused. so we 
begin by defining our terms. 

Changes in the Internet’s structure have taken place along 
two orthogonal axes over the past few decades, yet they are ae 
ten conflated. The first axis concerns distribution—centralize 


vs. distributed—and whether the chain resources ne ac- 
cessed for some service are located at a single machine (at 


'The notion of building a re-decentralized Internet has become popular: it 
was a central plot device in the TV show Silicon Valley, though their system 
was made possible by a magical compression algorithm. 


one extreme) or dispersed across many machines all over 
the “aaa (at the other): The second axis concerns 
em ic vs. feudal—and whether the authority over the 
service and the machines providing a service is spread across 
many individuals or organizations or held by a few. 
The Internet of today is quite different from that of a a few 
decades jiasi aoe both axes: it ha a 


undo the necessary frend roars wide distribution, but to 
disperse control. Put another way, the scale-out design phi- 
losophy that has served us well in the design of systems over 
the past two decades must now be applied to the control of 
systems as well [39]. 


2.1 Feudal Internet Features 


Our primary focus is decentralizing administrative control 
of various systems. Before we do so hastily, we believe it is 
important to understand the reasons for today’s centralized 


administrative control Gia leads toa feudal Internet). Cen- — 


——r EErEE to running one’s own service, cloud 
services are always on, accessible, fast, secure, scalable, 
sharable, and easy to setup, use, and maintain. 


Homogeneity. Most users are on the same set of platforms 


so users are familiar with how to use them, and have a rea- 
sonable understanding about how they work. Homogeneity 
also produces important network effects, particularly in social 
applications. 

Cost. Cloud services are cheap; most services are offered for 
free to end users, at least initially. 


These considerable benefits are what make cloud services so 
popular and difficult for users to escape from despite their 


tual y, suffic ent users 


However, centralized systems are attractive not only to 
users but also to systems designers and operators because of: 


. It has proved easier to scale Internet services 
when they are under centralized administrative control. New 
protocols and architectures, hardware and software—all of 
which can be designed in concert to improve performance— 
can be rolled out systematically. This has proved to be the 
case in datacenter design, where a single organization can 
tailor its systems to the needs of the services it runs. 


?It may seem strange to describe the Internet of the past as partially cen- 
tralized, but numerous services were indeed centralized. Consider the way 
the Web and FTP services were run in the 1990s when the average person 
didn’t and couldn’t afford to operate an always-on server. These servers 
were generally hosted by ISPs (centralized), of which there were hundreds to 
thousands (semi-democratized). 


egative vert toi user’s ee an ena „emp í 
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T. Centralized authority simplifies security policies: 
instead of data coming from other (distributed) nodes that are 


assumed to be untrustworthy, other nodes can be assumed 
to be cooperative and trustworthy, making the design of dis- 
tributed algorithms and systems simpler. In addition, it is 
easier to uniformly and rapidly secure systems under central- 
ized command. 

Financing. The cost of aggregated infrastructure can benefit 
from economies of scale, which potentially lowers the cost of 
providing a service. Today’s financial incentives encourage 
the aggregation and monetization of users and their data. 


2.2 Goals 


A democratized Internet would need to provide the benefits 
that users have come to expect of today’s Internet: the conve- 
nience of not having to maintain one’s own infrastructure, the 
homogeneity of use across many devices and from different 
locations, and the low costs of both of these. On the other 
hand, there is little need to meet the needs of researchers and 
engineers who design and build such infrastructure—they 
(we) are responsible for designing systems and we can choose 
to build alternative systems. However, financial constraints 
are a key limiting factor for democratized Internet service 
architectures, something we return to later. 


3 RE-DEMOCRATIZING THE INTERNET 


A slew of recent systems have been developed in academia, in- 
dustry, and as open source projects in attempts to democratize 
key Internet services [2, 4, 6, 7, 14, 16, 18, 18, 22, 23, 27, 27— 
30, 34, 35, 40, 41, 43, 44, 47, 50, 52, 54]. In Table 1, we 
roughly categorize these recent systems by the central prob- 
lem(s) they aim to solve. This list is by no means exhaustive, 
but does represent a set of relatively well-known projects. 
There is some overlap, but the core problems they tackle 
fall into four Sea ee 


€ | se less ions. In this section, we 
ee IONN recent efforts approach these problems, how 
they compare with past systems, and what pieces are missing. 


3.1 Name Registration 


Three mechanisms are commonly used to represent user iden- 
tities on the Internet: public keys, personal information, and 
pseudonyms. Public-key-based identities consisting of opaque 
strings help preserve privacy and are considered relatively se- 
cure; however, such identities have faced usability barriers for 
as long as public-key cryptography has existed. Since none 
of these three basic mechanisms are simultaneously usable, 
secure, and privacy preserving by themselves, a name (or 
pseudonym) is combined with a pab j to yield a secure, 
Pupan anae county, C ei 


(e.g., centralized administrative control, CA compromises, 


Decentralization Problem 


Recent Projects 


Naming 


Namecoin, Emercoin, Blockstack 


Group Communication 
(e.g., public messaging and social networking) 


Matrix, Riot, Ring, Nextcloud, GNU social, Mastodon, Friendica, Identi.ca 


Data storage 


IPFS, Blockstack, Maidsafe, Secure-scuttlebutt, Nextcloud, Sia, Storj, 
Swarm, Filecoin 


Web applications 


Beaker, ZeroNet, Freedom.js 


Table 1: Decentralization problems and examples of recent projects 


WoT Sybil attacks, etc.). The literature is replete with re- 
search on identity management, naming systems, PKIs and 
the fundamental tradeoffs that exist [17, 21, 24, 46, 53]. 
Motivated by these problems and by the rise of Bitcoin [33], 
several recent decentralized alternatives to centralized PKIs 
have been developed that leverage advances in blockchain 
technology, including: Namecoin [34], Emercoin [14], and 
Blockstack k [2, 3]. B p j e 


ity. These blockohainchased naming schemes manage to 
resolve Zooko’s Triangle [25] by providing, simultaneously, 
Tanan earned secure, and decentralized names. How. 


others (see [2, 31]). Since name registration does not require 
high bandwidth or large amounts of data to be stored, those 
two weaknesses are mitigated in this use case, but the other 
challenges remain. 


3.2 Group Communication 


For the purposes of our discussion, we consider both group 
messaging and online social networking as group communi- 
cation problems since they are roughly analogous and share 
several requirements. Public messaging or publicly-accessible 
social networks (e.g., Reddit, Twitter) exacerbate scalability, 
security, and privacy challenges. 

Group communication an sharie platforms neve been 


ae a user’s service haba knew the user’s real identity 
and there was no single authoritative entity that owned 


the network or controlled its content. Usenet eventually 


Eeee In addition o the features from Section 2.1, 
messaging and social networking systems should provide the 
following communication-specific features: 


3This relates to an ethical debate that we do not explore in this paper. 
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Connectedness. Users should be able to communicate with 
others in the face of node failures, node attrition, loss of 
communication channels, etc. 


Abuse Prevention. Platforms should have mechanisms that 


handle abuse, however abuse is defined (e.g., spam, hate 
speech, brigading, etc.). This property becomes more salient 
as the scope increases and becomes more public. 


No identifying information about users should be 
revealed to an unauthorized entity. 


Today’s most popular messaging and online social net- 
works (OSN) can achieve the first two properties due to their 
centralized infrastructural and administrative nature. How- 
ever, the connectedness of centralized platforms depends com- 
pletely on the prerogative of platform operators. For example, 
if it is no longer profitable to provide service to a user or 
they “misbehave”, access to the platform can be unequivo- 
cally revoked and petsonal data rengeted inaccessible. & 


messaging and OSNs cope with abuse is 
However, moderation is often in direct tension with freedom 
of expression, and can be influenced by governments, other 
powerful organizations, or individuals in positions of author- 
ity. Finally, it is well- known oa due to the ie proiit motive of 


ough moderation. 


these platform: 


p . . j 
Centralized platforms have made some progress keeping 


user data private from other users, but they have simultane- 
ously continued to violate user privacy as their monetization 
strategies grow more sophisticated. For example, user data 
today is mined for social profiling, monetized directly (e.g., 
via advertising), or sold to third-party organizations. 

These numerous challenges have sparked many efforts to 
consider democratized communication platforms while at- 
tempting to maintain a comparable level of service. We cate- 
gorize these systems into two network models: socially-aware 
P2P and federated. 


Socially-aware P2P Systems 

Academic systems like PrPl [45], Persona [5], Lockr [49] 
and OTR (Off The Record messaging) [9] have mainly fo- 
cused on protecting user privacy. Rather than uploading data 
to geographically distributed servers as in Usenet—which 
also comes at a cost of service availability as servers often 


face huge number of connections and temporarily refuse new 
connections—PrPI [45] allows users to retain ownership over 
their data by storing it on home servers or in encrypted form 
on public storage providers. PrPl [45] lets users define access 
levels, i.e, some users (trusted nodes or “friends”) are allowed 
to access private data while others only have access to public 
data. Persona [5] and Lockr [49] provide similar functionality, 
but take this further by allowing users to define relationships 
with other users and ensuring that relationships are not ex- 
ploited. OTR [9] introduces the concepts of repudiability and 


2 PrPl, rastel users can a certeses t to ¢ access s the data 
directly from the storage, without the need to go through “But- 
lers” to maintain accessibility. Persona [5] claims to provide a 
relatively high level of service availability, but data and meta- 
data are not coupled together, which can harm availability 
in the event of node failures. Lockr [49] and OTR [9] are 
designed to maintain user ownership and privacy of data at 
the cost of service availability. 

We call these systems oe aware ee pee (P2P) 
pae ee they require users efine socia st rela 


systems, centralized systems have the par AA security, 
and financial benefits described in Section 2.1. Socially-aware 
P2P networks improve some of the security and privacy draw- 
backs of traditional P2P networks since users are communi- 
cating with other users that wo trust. However, this comes at 


these Pevi can rbe tedious it the user as it can be chal- 
lenging to quantify social index value of the relationships 
between nodes. 


Federated Systems 
Many recent non-academic suas that decentralize Bee 
communication place e a i 

jn aiit These rene are mentite all 
built on a federated model. 

Riot [41] is a chat application which is based on Matrix, a 

federated network protocol. Matrix [30] provides high avail- 
ability by replicating data over the entire network and en- 


sures privacy by using end-to-end encryption techniques like 
the double ratchet oe [37]. 77], Although messages are 


) S “which slightly compromises the level 
of privacy by revealing the identities of the participants of an 


GNU Social [18] is sao federunne based ocal net- 
working application that relies on OStatus [36] for federa- 
tion. OStatus allows real-time exchange of messages between 
nodes, but there are no intrinsic privacy mechanisms; privacy 
must be implemented at the application level. Mastodon [29] 
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also runs on OStatus [36] and provides similar functionality 
to GNU Social, but also allows federations to define their own 
rules on abuse (e.g., racism, sexism, xenophobia, violence, 
gender discrimination, etc.). Unlike Matrix [30], OStatus- 
based applications are bottlenecked by single servers that can 
cause entire instances to be inaccessible if they fail. 
Identi.ca [23] and Friendi.ca [16] are based on the popular 
federated stream server, pump.io [38]. Pump.io [38] makes 
it easy to disseminate information in the network, and uses 
OAuth to restrict unauthorized access to private data. Differ- 
ent servers are capable of providing their own functionalities. 
Friendi.ca [16] provides its own application-level privacy 
measures through private one-to-one and group messaging, 
expiring old data and giving users ownership of their data. 


3.3 Data storage 


Storing data is a fundamental function of many Internet ser- 
vices. Although messaging systems also technically store 
data, for the purposes of this discussion we consider systems 
that primarily focus on storage. Compared with centralized 
storage systems, decentralized systems potentially provide 
cheaper storage services (since users already have devices), 
and are potentially resistant to censorship and unauthorized 


access. However, despite these arenes) scaling decentral- _ 


There is a large amount of literature, mostly from the era 
in which peer-to-peer systems were popular, on distributed 


storage systems [1 0, 12, 20, 26, 51]. To improve perfor- 


than administratively democratized storage and do not ac- 
count for malicious nodes. 

In contrast, blockchain mechanisms are completely decen- 
tralized; many storage systems build on top of blockchains 
in some way. Table 2 summarizes some recent systems that 
provide decentralized storage. We observe that, with the ex- 
ception of IPFS and MaidSafe, many of these decentralized 
storage systems use blockchains to publicly record contracts 
and to facilitate payments. Here a contract is an object that 
defines a service agreement between two parties: storage 
providers and consumers. The exact contents of this agree- 
ment vary from system to system, but, generally, it contains 
information about storage and retrieval (e.g., how much data 
should be stored, how often the data should be retrieved), pric- 
ing, and proof-of-storage requirements. This use case, like in 
naming, uses blockchains for its intended purpose (as a slow, 
but consistent and verifiable public ledger) while minimizing 
any impacts of its weaknesses. 


Blockchain Usage Incentive Scheme 
IPFS || None Bitswap Ledgers 
7 Proof-of-resource 
MaidSafe || None Distributed transaction 
Sia ohare Proof-of-storage 
contract 
Storj Pee payments Proof-of-retrievability 
(storjcoin) 
Ethereum blockchain for 
Swarm domain name resolution, | Proof-of-storage: 
payments, and content | SWEAR 
availability insurance 
es Proof-of-replication 
Š ._ || Facilitate payments : 
Filecoin (filecoin) Proof-of-spacetime 
Proof-of-work 
Bind domain name 
Blockstack || public key and zone N/A 
file hash 


Table 2: Comparison of Surveyed Storage Systems. 


In P2P storage systems nodes must contribute storage and 
bandwidth and cooperate with each other to store and serve 


their own performance. To air this mare aa 
sich as s Filecoin [27], Sia [50], MaidSafe [28], Storj [52], and 
Swarm [47] use blockchains to build-in incentives for data 
storage. Essentially, nodes that wish to store and retrieve data 
pay other nodes for storing and serving data for them. Nodes 
are therefore incentivized to contribute storage and bandwidth 
and to cooperate (and compete) with each other to make 
the storage system function as a whole. Blockchain mecha- 
nisms such as proof-of-work have inspired many variations 
of storage-focused proof-of-work mechanisms such as: proof- 
of-storage, proof-of-retrievability, proof-of-replication, and 
proof-of-spacetime [27, 47, 50, 52]. Proof-of-Replication, for 
example, allows a node to convince others that they are stor- 
ing exactly the same number of copies as they have claimed 
instead of creating multiple identities, and storing data just 
once (Sybil Attacks), of fetching from others (Outsourcing At- 
tacks), or of e on- -demand A@enetalion ae [27 ; 


le ga our table, the aniy exception 
Blockstack, which does not focus on decentralizing storage; 
its users are to use the data store of their choice, such as from 
a cloud provider. 


3.4 Web Applications 


Today, the web service platforms owned by a few large com- 
panies, such as Google, Amazon, and Microsoft, provide 
most of the necessary components for the web (storage, data- 
base, computation, content delivery, management, etc.) to web 
applications and guarantee high service quality, e.g., Ama- 
zon Elastic Compute Cloud (Amazon EC2) promises 99.95% 
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availability for each Amazon EC2 Region [13]. 


equently, many web services are further bound to a few 


large web service providers and thus censoring of or censor- 
ship by the service providers is sufficient to disrupt the web 
applications running on them. 

Recent decentralized systems replace the traditional client- 
server web architecture with a novel browser-based web ar- 


This browser-based web architecture typically leverages ex- 
isting technologies such as , and combines them with 


mechanisms to allow users to easily create, modify, and share 
hostless web applications. 

Freedom.js [43], for example, uses a browser-based web 
architecture where a web application, including its back-end 
logic, runs entirely in a web browser. Three types of APIs, the 
identity, storage, and transport, are provided to application 
developers. It leverages existing techniques, e.g., WebRTC is 
used to establish a direct peer-to-peer connection to transmit 
data, and a reliable DHT can be selected to store data glob- 
ally. ZeroNet [54] is a decentralized web platform in which 
web applications are seeded and served by visitors via the 
BitTorrent protocol. When a new web application is created 
by the application developer, the application developer gets a 
public key pair. The public key is the new site address which 
can be looked up on trackers or DHTs, and every file of and 
update about the web application can be securely verified by 
verifying the corresponding signature. The public key is also 
a standard Bitcoin address for accepting donations and pay- 
ments directly to the web application. Beaker [6] is a tailored 
web browser enabling users to create and host websites di- 
rectly from browsers. Like ZeroNet [54], resources in Beaker 
are served and distributed in a peer-to-peer network. What dis- 
tinguishes Beaker is that, motivated by Git, it explicitly allows 
forking and merging web applications, advocating openness 
at the code level. 


4 INFRASTRUCTURE FEASIBILITY 


Infrastructure is often overlooked by designers of the sys- 
tems we have considered in this paper; it is assumed that the 
resources to run emona services exist. a a basic 


ere we perform a back of the envelope calculation. We 
compare the resources of today’s cloud infrastructure with 
the currently-unproductive capacity of distributed infrastruc- 
ture (e.g., personal devices). We focus on three resources: 1) 
bandwidth, 2) compute, and 3) storage. For simplicity, we 
focus on a specific provider’s resources, Google, and then 
scale up to estimate global capacity. No public data exists on 


Cloud Infrastructure | User Devices 
Bandwidth 200 Tbps 5000 Tbps 
Cores 400 M 500 M 
Storage 80 EB 210 EB 


Table 3: Estimated capacity of global cloud infrastruc- 
ture and unused user resources (server-equivalent cores). 


Google’s network or compute capacity. Various reports from 
a few years ago [19, 32] estimate that Google has about 1 
million servers and 10 EB of storage. We might extrapolate 
that today Google has about 100 million cores and 20 EB 
of storage. One recent estimate [48], puts the current rate 
of Internet traffic at a little over 200 Tbps in 2016. Since 
Google estimates that it handles one quarter of the Internet’s 
traffic [15], we scale up these figures by a factor of 4, yielding 

—" Ee ie ATi AA Treunt OOD ra 


a O mE devices such 


as smartphones and tablets cannot be relied upon to do com- 
pute given battery constraints. Thus we estimate that only 
storage—about 210 EB—is available across all devices. For 
compute, we take the 4 billion cores available across personal 
computers and reduce their estimated capacity by a factor 
of 8 to account for weaker processors (versus server CPUs) 
and to allow for power management, yielding 500 million 
server-equivalent cores. Finally, we must estimate the band- 
width available across these devices. Assuming devices are 
connected to the Sa Asi a slow broadband connection 
Mbps upstream bandwidth, in the case of 

personal OET, aT sow 3G eonnections that also have 
1 Mbps upstream bandwidth for mobile devices, this yields 
We summarize these estimates in 

ghl Spe e sufficie 


Table 3. Roug 


5 DISCUSSION 


Infrastructure feasibility is only a preliminary sanity check 
that democratizing Internet services is possible. Clearly, there 
are many difficult challenges in terms of performance and 
robustness of decentralized approaches. However, we believe 
that blockchains, like bittorrent and DHTs before them, are 
key components of recent democratized services, and are a 
promising avenue of research that we should work on. Improv- 
ing blockchains and leveraging its properties for applications 
that do not mind their weaknesses are important avenues for 
future work. However, we also believe that we need work that 
goes beyond blockchains to overthrow Internet feudalism. 


Does there exist enough unproductive cani amon 
devices to meet this resource demand? T 
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5.1 Easy Problems 


Studying the performance and security of blockchain- 
based systems: hacker communities dedicated to such efforts 
have made headway in developing many blockchain-based 
systems, but have typically neglected performance evaluation 
and a thorough evaluation of their security models under new 
requirements (e.g., when used to back a storage system). 
Doing what systems researchers do best: design, build, and 
evaluate new systems and primitives for building performant 
decentralized systems. 


: federated approaches are an ideal stepping stone 
from today’s feudal model, in that they allow explicit control 
of the granularity of a domain. However many of these sys- 


tems have not been architected with canonical systems goals 
in mind, such as fault tolerance. 


5.2 Moderate Problems 


Overcoming the mism: etween research/engineer ob 
jestives and user needs s systems often solve hard or exciting 
problems when users’ needs and desires are more mundane 
(e.g., getting systems researchers to attend to usability of com- 
plex systems). 


an the ga 


ated ¢ group communication systems, do not provide significant 
privacy features, and often do not leverage the latest thinking 
from the academic network security and privacy community. 
We can build mechanisms or toolkits that can be plugged in 
by the hacker community in their projects. 

Grappling with infrastructure quality vs. quantity: as we 
discuss above, there exists more than enough unproductive 
capacity among user devices worldwide. However, the qual- 
ity of this infrastructure is much poorer than what a typical 
datacenter provides; As such, aR must be geened te to 


hous go into ` building Google, Facebook. etc., so building 
alternatives may require similar engineering efforts. 


approaches may include “guerrilla” tactics such as running 
encrypted arcee on the cloud. 

re g the re-e ence udalism: while there is a 
long road to re- pramena the Internet in the first place, 
systems must prevent backsliding to the feudal model. Unfor- 
tunately, like the other problems in this class, this may not be 
an entirely technical problem as c 
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